Approximate Regular Expression Pattern

نویسندگان

  • James R. Knight
  • Eugene W. Myers
چکیده

Given a sequence A of length M and a regular expression R of length P, an approximate regular expression pattern matching algorithm computes the score of the optimal alignment between A and one of the sequences B exactly matched by R. An alignment between sequences A = a 1 a 2 : : :a M and B = b 1 b 2 : : :b N is a list of ordered pairs, <(i 1 ; j 1); (i 2 ; j 2); : : :(i t ; j t)> such that i k < i k+1 and j k < j k+1. In this case, the alignment aligns symbols a ik and b jk , and leaves blocks of unaligned symbols, or gaps, between them. A scoring scheme S associates costs for each aligned symbol pair and each gap. The alignment's score is the sum of the associated costs, and an optimal alignment is one of minimal score. There are a variety of schemes for scoring alignments. In a concave gap-penalty scoring scheme S = f; wg, a function (a; b) gives the score of each aligned pair of symbols a and b, and a concave function w(k) gives the score of a gap of length k. A function w is concave if and only if it has the property that for all k > 1, w(k + 1) ? w(k) w(k) ? w(k ? 1). In this paper we present an O(MP(logM + log 2 P)) algorithm for approximate regular expression matching for an arbitrary and any concave w.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reporting Exact and Approximate Regular Expression Matches

While much work has been done on determining if a document or a line of a document contains an exact or approximate match to a regular expression, less e ort has been expended in formulating and determining what to report as \the match" once such a \hit" is detected. For exact regular expression pattern matching, we give algorithms for nding a longest match, all symbols involved in some match, ...

متن کامل

A GPGPU Implementation of Approximate String Matching with Regular Expression Operators and Comparison with Its FPGA Implementation

In this paper, we propose an efficient GPGPU implementation of an algorithm for approximate string matching with regular expression operators, originally implemented on an FPGA, and compare the GPGPU, FPGA and CPU implementations by experiments. Approximate string matching with regular expression operators is used in various applications, such as full text database search and DNA sequence analy...

متن کامل

On Approximate Pattern Matching for a Class of Gibbs Random Fields

We prove an exponential approximation for the law of approximate occurrence of typical patterns for a class of Gibbsian sources on the lattice Z, d ≥ 2. From this result, we deduce a law of large numbers and a large deviation result for the the waiting time of distorted patterns. Key-words: Gibbs measures, approximate matching, exponential law, lossy data compression, law of large numbers, larg...

متن کامل

Approximate Regular Expression Matching

We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995